A Hybrid System for Continuous Word-Level Emphasis Modeling Based on HMM State Clustering and Adaptive Training

نویسندگان

  • Quoc Truong Do
  • Tomoki Toda
  • Graham Neubig
  • Sakriani Sakti
  • Satoshi Nakamura
چکیده

Emphasis is an important aspect of speech that conveys the focus of utterances, and modeling of this emphasis has been an active research field. Previous work has modeled emphasis using state clustering with an emphasis contextual factor indicating whether or not a word is emphasized. In addition, cluster adaptive training (CAT) makes it possible to directly optimize model parameters for clusters with different characteristics. In this paper, we first make a straightforward extension of CAT to emphasis adaptive training using continuous emphasis representations. We then compare it to state clustering, and propose a hybrid approach that combines both the emphasis contextual factor and adaptive training. Experiments demonstrated the effectiveness of adaptive training both stand-alone or combined with the state clustering approach (hybrid system) with it improving emphasis estimation by 2-5% F -measure and producing more natural audio.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Segmental Neural Net Optimization for Continuous Speech Recognition

Previously, we had developed the concept of a Segmental Neural Net (SNN) for phonetic modeling in continuous speech recognition (CSR). This kind of neural network technology advanced the state-of-the-art of large-vocabulary CSR, which employs Hidden Marlcov Models (HMM), for the ARPA 1oo0-word Resource Management corpus. More Recently, we started porting the neural net system to a larger, more ...

متن کامل

Context adaptive training with factorized decision trees for HMM-based speech synthesis

To achieve natural high quality synthesised speech in HMMbased speech synthesis, the effective modelling of complex acoustic and linguistic contexts is critical. Traditional approaches use context-dependent HMMs with decision tree based parameter clustering to model the full combination of contexts. However, weak contexts, such as word-level emphasis in neutral speech, are difficult to capture ...

متن کامل

Context adaptive training with factorized decision trees for HMM-based statistical parametric speech synthesis

To achieve natural high quality synthesized speech in HMM-based speech synthesis, the effective modelling of complex acoustic and linguistic contexts is critical. Traditional approaches use context-dependent HMMs with decision tree based parameter clustering to model the full combinatorial of contexts. However, weak contexts, such as word-level emphasis in natural speech, are difficult to captu...

متن کامل

Context-dependent hybrid HME/HMM speech recognition using polyphone clustering decision trees

This paper presents a context-dependent hybrid connectionist speech recognition system that uses a set of generalized hierarchical mixtures of experts (HME) to estimate context-dependent posterior acoustic class probabilities. The connectionist part of the system is organized in a modular fashion, allowing the distributed training of such a system on regular workstations. Context classes are ba...

متن کامل

A New Fast and Efficient HMM-Based Face Recognition System Using a 7-State HMM Along With SVD Coefficients

In this paper, a new Hidden Markov Model (HMM)-based face recognition system is proposed. As a novel point despite of five-state HMM used in pervious researches, we used 7-state HMM to cover more details. Indeed we add two new face regions, eyebrows and chin, to the model. As another novel point, we used a small number of quantized Singular Values Decomposition (SVD) coefficients as feature...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016